The Dreyfus Affair and Probability Theory

February 05, 2015

The Dreyfus Affair and Probability Theory: Pandora's Box

Introduction

In this article, we focus on a particular aspect of the Dreyfus Affair, a major event that took place at the end of the 19th century in France. Specifically, we examine the expert report prepared by Henri Poincaré, Gaston Darboux, and Paul Appell in 1904, which detailed the probabilistic reasoning errors made by Bertillon and presented to the Court of Cassation in 1899.

The 100-page report written by these three academics contains several interesting insights into the probabilistic reasoning errors committed by Bertillon. It is fascinating to note that the same mistakes have been repeated time and again throughout the following century. The same errors highlighted in this 1899 report were repeated, for example, in the Sally Clark case exactly 100 years later (in 1999).

The objective of this article is to translate these observations into modern mathematical language, provide a simplified example of each error, and, to a lesser extent, reconstruct the calculations to arrive at the same result.

Attorney General Baudouin in his indictment in 1906:

"It was claimed that the battle was to be fought on scientific, mathematical grounds. Let the masters of this science be called upon to tell us what they think of the science of Messrs. Bertillon and company, of their calculations, of their deductions! And so, three of the leading members of the Institute were entrusted with this examination: Mr. Darboux, perpetual secretary of the Academy of Sciences; Mr. Appell, dean of the Faculty of Sciences of Paris; and Henri Poincaré, whose name alone is enough to bring him glory. These scholars, before whom the entire world bows and who are the pride of our country, were given all the documents, were allowed to gather all necessary information, hear all witnesses, and conduct all verifications."

Probability of Repeated Trials

To simplify the probability problem concerning repeated polysyllabic sequences, we can state an equivalent problem:

Suppose the probability of winning the lottery is ( $p = 0.2$ ). How can we determine the probability of winning the lottery 4 times if a player plays 26 times in a row? Bertillon estimated that this probability is ( $p^4 = 0.2^4 = 0.0016$ ).

Poincaré indicated that the probability of finding four coincidences is 0.7. We do not know exactly how he arrived at this result, and we may never know. However, we can attempt to retrieve this value: The player played 26 times, meaning he had ( $C_{26}^4$ ) (the number of subsets of size 4 within a set of size 26) opportunities to win four times. Each opportunity to win has a probability of ( $p^4(1-p)^{26-4}$ ). It follows that:

$\ P(X=4) = 0.2^4 \times 0.8^{22} \times C_{26}^4$

The probability of finding an isolated coincidence is 0.2.
The probability of finding four coincidences among the 26 initial and final positions of repeated polysyllabic sequences is:

CALCULATION DETAILS:

P(X=4) = 0.2^4 \times 0.8^{22} \times C_{26}^4

P(X \geq 4) = \sum_{k=4}^{26} 0.2^k \times 0.8^{26-k} \times C_{26}^k = 0.79

\sum_{k=0}^{26} 0.2^k \times 0.8^{26-k} \times C_{26}^k = (0.2 + 0.8)^{26} = 1

\sum_{k=4}^{26} 0.2^k \times 0.8^{26-k} \times C_{26}^k = 1 - \sum_{k=0}^{3} 0.2^k \times 0.8^{26-k} \times C_{26}^k

P(X \geq 4) = 1 - 0.8^{23} \times 34

We can use a logarithm table to approximate ( 0.8^{23} ):

\log(0.8^{23}) = 23 \times (\log 8 - \log 10) - 2.3

Using an antilogarithm table to reverse the operation:

0.8^{23} \approx 0.005

P(X \geq 4) = 0.79